“A problem is a chance for you to do your best.” Duke Ellington
In this report we will be looking and handling data about patients that had been admitted to the ICU, with problems related to hearth deceased. The scope of the study is to explore the data, recognize and clean the data in order to obtain a clean input to make a further selection or manipulation in the features that can give us the most information to run a machine learning model in the phase 2.
In this exercise we will use the MIMIC-IV data set, this is a publicly available database containing datums about patient measurements, diagnoses, and other classification information.
For this first part, we are going to explore the general characteristics of the dataset.
##
##
## | x|
## |----:|
## | 6377|
## | 207|
Comments: For this particular version of the data we have 6377 observations and 207 variables.
##
##
## |x |
## |:-----------------------------------------------------------------|
## |subject_id |
## |gender |
## |age |
## |mortality |
## |ethnicity |
## |Heart.Rate |
## |Heart.rate.Alarm...High |
## |Heart.Rate.Alarm...Low |
## |Arterial.Blood.Pressure.systolic |
## |Non.Invasive.Blood.Pressure.systolic |
## |Arterial.Blood.Pressure.diastolic |
## |Non.Invasive.Blood.Pressure.diastolic |
## |Respiratory.Rate |
## |Respiratory.Rate..Set. |
## |Respiratory.Rate..spontaneous. |
## |Respiratory.Rate..Total. |
## |SpO2.Desat.Limit |
## |INR |
## |Prothrombin.time |
## |Anion.gap |
## |Creatinine..serum. |
## |Temperature |
## |Potassium..Whole.Blood.2 |
## |Potassium..whole.blood. |
## |Sodium..whole.blood. |
## |Sodium..Whole.Blood |
## |Chloride..Whole.Blood |
## |Chloride..whole.blood. |
## |Bicarbonate |
## |Glucose..whole.blood. |
## |GCS...Eye.Opening |
## |Hemoglobin |
## |Hemoglobin.2 |
## |Hematocrit |
## |Platelet.Count |
## |Acute.myocardial.infarction.of.anterolateral.wall..episode.of.c |
## |Acute.myocardial.infarction.of.anterolateral.wall..initial.epis |
## |Acute.myocardial.infarction.of.anterolateral.wall..subsequent.e |
## |Acute.myocardial.infarction.of.other.anterior.wall..episode.of. |
## |Acute.myocardial.infarction.of.other.anterior.wall..initial.epi |
## |Acute.myocardial.infarction.of.other.anterior.wall..subsequent. |
## |Acute.myocardial.infarction.of.inferolateral.wall..episode.of.c |
## |Acute.myocardial.infarction.of.inferolateral.wall..initial.epis |
## |Acute.myocardial.infarction.of.inferolateral.wall..subsequent.e |
## |Acute.myocardial.infarction.of.inferoposterior.wall..episode.of |
## |Acute.myocardial.infarction.of.inferoposterior.wall..initial.ep |
## |Acute.myocardial.infarction.of.inferoposterior.wall..subsequent |
## |Acute.myocardial.infarction.of.other.inferior.wall..episode.of. |
## |Acute.myocardial.infarction.of.other.inferior.wall..initial.epi |
## |Acute.myocardial.infarction.of.other.inferior.wall..subsequent. |
## |Acute.myocardial.infarction.of.other.lateral.wall..episode.of.c |
## |Acute.myocardial.infarction.of.other.lateral.wall..initial.epis |
## |Acute.myocardial.infarction.of.other.lateral.wall..subsequent.e |
## |Acute.myocardial.infarction.of.other.specified.sites..episode.o |
## |Acute.myocardial.infarction.of.other.specified.sites..initial.e |
## |Acute.myocardial.infarction.of.other.specified.sites..subsequen |
## |Acute.myocardial.infarction.of.unspecified.site..episode.of.car |
## |Acute.myocardial.infarction.of.unspecified.site..initial.episod |
## |Acute.myocardial.infarction.of.unspecified.site..subsequent.epi |
## |Postmyocardial.infarction.syndrome |
## |Acute.coronary.occlusion.without.myocardial.infarction |
## |Old.myocardial.infarction |
## |Certain.sequelae.of.myocardial.infarction..not.elsewhere.classi |
## |Acute.myocardial.infarction |
## |ST.elevation..STEMI..myocardial.infarction.of.anterior.wall |
## |ST.elevation..STEMI..myocardial.infarction.involving.left.main |
## |ST.elevation..STEMI..myocardial.infarction.involving.left.anter |
## |ST.elevation..STEMI..myocardial.infarction.involving.other.coro |
## |ST.elevation..STEMI..myocardial.infarction.of.inferior.wall |
## |ST.elevation..STEMI..myocardial.infarction.involving.right.coro |
## |ST.elevation..STEMI..myocardial.infarction.involving.other.coro.2 |
## |ST.elevation..STEMI..myocardial.infarction.of.other.sites |
## |ST.elevation..STEMI..myocardial.infarction.involving.left.circu |
## |ST.elevation..STEMI..myocardial.infarction.involving.other.site |
## |ST.elevation..STEMI..myocardial.infarction.of.unspecified.site |
## |Non.ST.elevation..NSTEMI..myocardial.infarction |
## |Acute.myocardial.infarction..unspecified |
## |Other.type.of.myocardial.infarction |
## |Myocardial.infarction.type.2 |
## |Other.myocardial.infarction.type |
## |Subsequent.ST.elevation..STEMI..and.non.ST.elevation..NSTEMI..m |
## |Subsequent.ST.elevation..STEMI..myocardial.infarction.of.anteri |
## |Subsequent.ST.elevation..STEMI..myocardial.infarction.of.inferi |
## |Subsequent.non.ST.elevation..NSTEMI..myocardial.infarction |
## |Subsequent.ST.elevation..STEMI..myocardial.infarction.of.other |
## |Subsequent.ST.elevation..STEMI..myocardial.infarction.of.unspec |
## |Certain.current.complications.following.ST.elevation..STEMI..an |
## |Hemopericardium.as.current.complication.following.acute.myocard |
## |Atrial.septal.defect.as.current.complication.following.acute.my |
## |Ventricular.septal.defect.as.current.complication.following.acu |
## |Rupture.of.cardiac.wall.without.hemopericardium.as.current.comp |
## |Rupture.of.chordae.tendineae.as.current.complication.following |
## |Rupture.of.papillary.muscle.as.current.complication.following.a |
## |Thrombosis.of.atrium..auricular.appendage..and.ventricle.as.cur |
## |Other.current.complications.following.acute.myocardial.infarcti |
## |Acute.coronary.thrombosis.not.resulting.in.myocardial.infarctio |
## |Old.myocardial.infarction.2 |
## |Rheumatic.heart.failure..congestive. |
## |Congestive.heart.failure..unspecified |
## |Systolic..congestive..heart.failure |
## |Unspecified.systolic..congestive..heart.failure |
## |Acute.systolic..congestive..heart.failure |
## |Chronic.systolic..congestive..heart.failure |
## |Acute.on.chronic.systolic..congestive..heart.failure |
## |Diastolic..congestive..heart.failure |
## |Unspecified.diastolic..congestive..heart.failure |
## |Acute.diastolic..congestive..heart.failure |
## |Chronic.diastolic..congestive..heart.failure |
## |Acute.on.chronic.diastolic..congestive..heart.failure |
## |Combined.systolic..congestive..and.diastolic..congestive..heart |
## |Unspecified.combined.systolic..congestive..and.diastolic..conge |
## |Acute.combined.systolic..congestive..and.diastolic..congestive. |
## |Chronic.combined.systolic..congestive..and.diastolic..congestiv |
## |Acute.on.chronic.combined.systolic..congestive..and.diastolic.. |
## |Atrial.fibrillation |
## |Atrial.fibrillation.and.flutter |
## |Paroxysmal.atrial.fibrillation |
## |Persistent.atrial.fibrillation |
## |Longstanding.persistent.atrial.fibrillation |
## |Other.persistent.atrial.fibrillation |
## |Chronic.atrial.fibrillation |
## |Chronic.atrial.fibrillation..unspecified |
## |Permanent.atrial.fibrillation |
## |Unspecified.atrial.fibrillation.and.atrial.flutter |
## |Unspecified.atrial.fibrillation |
## |Other.chronic.obstructive.pulmonary.disease |
## |Chronic.obstructive.pulmonary.disease.with..acute..lower.respir |
## |Chronic.obstructive.pulmonary.disease.with..acute..exacerbation |
## |Chronic.obstructive.pulmonary.disease..unspecified |
## |Heat.stroke.and.sunstroke |
## |Brain.stem.stroke.syndrome |
## |Cerebellar.stroke.syndrome |
## |National.Institutes.of.Health.Stroke.Scale..NIHSS..score |
## |Heatstroke.and.sunstroke |
## |Heatstroke.and.sunstroke.2 |
## |Heatstroke.and.sunstroke..initial.encounter |
## |Heatstroke.and.sunstroke..subsequent.encounter |
## |Heatstroke.and.sunstroke..sequela |
## |Exertional.heatstroke |
## |Exertional.heatstroke..initial.encounter |
## |Exertional.heatstroke..subsequent.encounter |
## |Exertional.heatstroke..sequela |
## |Other.heatstroke.and.sunstroke |
## |Other.heatstroke.and.sunstroke..initial.encounter |
## |Other.heatstroke.and.sunstroke..subsequent.encounter |
## |Other.heatstroke.and.sunstroke..sequela |
## |Heatstroke.and.sunstroke..initial.encounter.2 |
## |Heatstroke.and.sunstroke..subsequent.encounter.2 |
## |Heatstroke.and.sunstroke..sequela.2 |
## |Family.history.of.stroke..cerebrovascular. |
## |Family.history.of.stroke |
## |Mixed.hyperlipidemia |
## |Other.and.unspecified.hyperlipidemia |
## |Mixed.hyperlipidemia.2 |
## |Other.hyperlipidemia |
## |Other.hyperlipidemia.2 |
## |Hyperlipidemia..unspecified |
## |Other.chronic.obstructive.pulmonary.disease.2 |
## |Chronic.obstructive.pulmonary.disease.with..acute..lower.respir.2 |
## |Chronic.obstructive.pulmonary.disease.with..acute..exacerbation.2 |
## |Chronic.obstructive.pulmonary.disease..unspecified.2 |
## |Senile.dementia..uncomplicated |
## |Presenile.dementia..uncomplicated |
## |Presenile.dementia.with.delirium |
## |Presenile.dementia.with.delusional.features |
## |Presenile.dementia.with.depressive.features |
## |Senile.dementia.with.delusional.features |
## |Senile.dementia.with.depressive.features |
## |Senile.dementia.with.delirium |
## |Vascular.dementia..uncomplicated |
## |Vascular.dementia..with.delirium |
## |Vascular.dementia..with.delusions |
## |Vascular.dementia..with.depressed.mood |
## |Alcohol.induced.persisting.dementia |
## |Drug.induced.persisting.dementia |
## |Dementia.in.conditions.classified.elsewhere.without.behavioral |
## |Dementia.in.conditions.classified.elsewhere.with.behavioral.dis |
## |Dementia..unspecified..without.behavioral.disturbance |
## |Dementia..unspecified..with.behavioral.disturbance |
## |Other.frontotemporal.dementia |
## |Dementia.with.lewy.bodies |
## |Vascular.dementia |
## |Vascular.dementia.2 |
## |Vascular.dementia.without.behavioral.disturbance |
## |Vascular.dementia.with.behavioral.disturbance |
## |Dementia.in.other.diseases.classified.elsewhere |
## |Dementia.in.other.diseases.classified.elsewhere.2 |
## |Dementia.in.other.diseases.classified.elsewhere.without.behavio |
## |Dementia.in.other.diseases.classified.elsewhere.with.behavioral |
## |Unspecified.dementia |
## |Unspecified.dementia.2 |
## |Unspecified.dementia.without.behavioral.disturbance |
## |Unspecified.dementia.with.behavioral.disturbance |
## |Alcohol.dependence.with.alcohol.induced.persisting.dementia |
## |Alcohol.use..unspecified.with.alcohol.induced.persisting.dement |
## |Sedative..hypnotic.or.anxiolytic.dependence.with.sedative..hypn |
## |Sedative..hypnotic.or.anxiolytic.use..unspecified.with.sedative |
## |Inhalant.abuse.with.inhalant.induced.dementia |
## |Inhalant.dependence.with.inhalant.induced.dementia |
## |Inhalant.use..unspecified.with.inhalant.induced.persisting.deme |
## |Other.psychoactive.substance.abuse.with.psychoactive.substance. |
## |Other.psychoactive.substance.dependence.with.psychoactive.subst |
## |Other.psychoactive.substance.use..unspecified.with.psychoactive |
## |Frontotemporal.dementia |
## |Other.frontotemporal.dementia.2 |
## |Dementia.with.Lewy.bodies |
## |Age.Group |
Comments: The variables names make mention of demographic classification, measurements on vital signs, lab tests and diagnoses about patient conditions or medical history.
Comments:
## The number of total missing values is: 37341
## The percentage of mortality is: 0.152 --> In the original dataset.
| missing_count | missing_percentage | |
|---|---|---|
| Temperature | 3840 | 60.2164027 |
| Chloride..whole.blood. | 3569 | 55.9667555 |
| Sodium..whole.blood. | 3362 | 52.7207151 |
| Glucose..whole.blood. | 2953 | 46.3070409 |
| Potassium..whole.blood. | 2802 | 43.9391563 |
| Arterial.Blood.Pressure.systolic | 2478 | 38.8583974 |
| Arterial.Blood.Pressure.diastolic | 2477 | 38.8427160 |
| Respiratory.Rate..Set. | 2470 | 38.7329465 |
| Hemoglobin | 2357 | 36.9609534 |
| Respiratory.Rate..spontaneous. | 2300 | 36.0671162 |
| Respiratory.Rate..Total. | 2286 | 35.8475772 |
| Chloride..Whole.Blood | 2267 | 35.5496315 |
| Sodium..Whole.Blood | 2121 | 33.2601537 |
| Potassium..Whole.Blood.2 | 1441 | 22.5968324 |
| INR | 236 | 3.7007997 |
| Prothrombin.time | 236 | 3.7007997 |
| Non.Invasive.Blood.Pressure.systolic | 45 | 0.7056610 |
| Non.Invasive.Blood.Pressure.diastolic | 45 | 0.7056610 |
| Anion.gap | 17 | 0.2665830 |
| Creatinine..serum. | 14 | 0.2195390 |
| SpO2.Desat.Limit | 13 | 0.2038576 |
| Heart.Rate.Alarm…Low | 4 | 0.0627254 |
| Heart.rate.Alarm…High | 3 | 0.0470441 |
| Bicarbonate | 3 | 0.0470441 |
| Respiratory.Rate | 2 | 0.0313627 |
## subject_id gender age mortality
## Min. :10002430 F:2396 Min. :19.00 Alive:5408
## 1st Qu.:12494493 M:3981 1st Qu.:61.00 Death: 969
## Median :14959313 Median :70.00
## Mean :14975796 Mean :69.31
## 3rd Qu.:17439137 3rd Qu.:79.00
## Max. :19997293 Max. :91.00
##
## ethnicity Heart.Rate Heart.rate.Alarm...High
## WHITE :4263 Min. : 43.0 Min. : 60.0
## UNKNOWN : 693 1st Qu.: 95.0 1st Qu.: 120.0
## BLACK/AFRICAN AMERICAN: 428 Median :109.0 Median : 130.0
## OTHER : 180 Mean :113.1 Mean : 260.8
## WHITE - OTHER EUROPEAN: 165 3rd Qu.:128.0 3rd Qu.: 133.8
## WHITE - RUSSIAN : 102 Max. :295.0 Max. :165130.0
## (Other) : 546 NA's :3
## Heart.Rate.Alarm...Low Arterial.Blood.Pressure.systolic
## Min. : 40.0 Min. : 0.0
## 1st Qu.: 50.0 1st Qu.:136.0
## Median : 60.0 Median :150.0
## Mean : 168.8 Mean :154.5
## 3rd Qu.: 60.0 3rd Qu.:168.0
## Max. :60120.0 Max. :742.0
## NA's :4 NA's :2478
## Non.Invasive.Blood.Pressure.systolic Arterial.Blood.Pressure.diastolic
## Min. : 56.0 Min. : 1.0
## 1st Qu.: 135.0 1st Qu.: 69.0
## Median : 153.0 Median : 78.0
## Mean : 160.8 Mean : 174.7
## 3rd Qu.: 171.0 3rd Qu.: 92.0
## Max. :15878.0 Max. :91100.0
## NA's :45 NA's :2477
## Non.Invasive.Blood.Pressure.diastolic Respiratory.Rate Respiratory.Rate..Set.
## Min. : 41.0 Min. : 15.00 Min. : 0.00
## 1st Qu.: 82.0 1st Qu.: 27.00 1st Qu.: 16.00
## Median : 98.0 Median : 32.00 Median : 18.00
## Mean : 215.9 Mean : 34.78 Mean : 20.02
## 3rd Qu.: 115.0 3rd Qu.: 38.00 3rd Qu.: 20.00
## Max. :105125.0 Max. :2037.00 Max. :1618.00
## NA's :45 NA's :2 NA's :2470
## Respiratory.Rate..spontaneous. Respiratory.Rate..Total. SpO2.Desat.Limit
## Min. : 0.00 Min. : 0.00 Min. : 85.00
## 1st Qu.: 13.00 1st Qu.: 18.00 1st Qu.: 85.00
## Median : 21.00 Median : 23.00 Median : 88.00
## Mean : 20.81 Mean : 25.74 Mean : 89.38
## 3rd Qu.: 28.00 3rd Qu.: 29.00 3rd Qu.: 88.00
## Max. :1918.00 Max. :3634.00 Max. :920.00
## NA's :2300 NA's :2286 NA's :13
## INR Prothrombin.time Anion.gap Creatinine..serum.
## Min. : 0.9 Min. : 9.3 Min. : 7.0 Min. : 0.3
## 1st Qu.: 1.2 1st Qu.: 13.5 1st Qu.: 14.0 1st Qu.: 0.9
## Median : 1.4 Median : 15.6 Median : 16.0 Median : 1.2
## Mean : 5049.9 Mean : 4905.8 Mean : 174.5 Mean : 944.9
## 3rd Qu.: 1.8 3rd Qu.: 19.9 3rd Qu.: 20.0 3rd Qu.: 2.1
## Max. :999999.0 Max. :999999.0 Max. :999999.0 Max. :999999.0
## NA's :236 NA's :236 NA's :17 NA's :14
## Temperature Potassium..Whole.Blood.2 Potassium..whole.blood.
## Min. :32.20 Min. : 1.800 Min. : 2.1
## 1st Qu.:36.80 1st Qu.: 4.400 1st Qu.: 4.3
## Median :37.20 Median : 5.000 Median : 4.9
## Mean :37.39 Mean : 5.124 Mean : 1403.6
## 3rd Qu.:37.90 3rd Qu.: 5.600 3rd Qu.: 5.5
## Max. :40.70 Max. :134.000 Max. :999999.0
## NA's :3840 NA's :1441 NA's :2802
## Sodium..whole.blood. Sodium..Whole.Blood Chloride..Whole.Blood
## Min. : 118.0 Min. :115.0 Min. : 71.0
## 1st Qu.: 135.0 1st Qu.:136.0 1st Qu.:103.0
## Median : 137.0 Median :138.0 Median :106.0
## Mean : 800.3 Mean :138.3 Mean :106.1
## 3rd Qu.: 139.0 3rd Qu.:140.0 3rd Qu.:109.0
## Max. :999999.0 Max. :187.0 Max. :139.0
## NA's :3362 NA's :2121 NA's :2267
## Chloride..whole.blood. Bicarbonate Glucose..whole.blood. GCS...Eye.Opening
## Min. : 11.0 Min. :13.0 Min. : 35 1: 72
## 1st Qu.: 104.0 1st Qu.:28.0 1st Qu.: 147 2: 47
## Median : 107.0 Median :30.0 Median : 173 3: 53
## Mean : 462.7 Mean :30.7 Mean : 2312 4:6205
## 3rd Qu.: 109.0 3rd Qu.:33.0 3rd Qu.: 210
## Max. :999999.0 Max. :51.0 Max. :1276100
## NA's :3569 NA's :3 NA's :2953
## Hemoglobin Hemoglobin.2 Hematocrit Platelet.Count
## Min. : 0.00 Min. : 5.10 Min. :18.10 Min. : 9.0
## 1st Qu.:10.70 1st Qu.:12.30 1st Qu.:37.60 1st Qu.: 231.0
## Median :12.10 Median :13.60 Median :41.40 Median : 304.0
## Mean :12.02 Mean :13.52 Mean :41.14 Mean : 339.9
## 3rd Qu.:13.40 3rd Qu.:14.80 3rd Qu.:44.60 3rd Qu.: 408.0
## Max. :97.00 Max. :22.60 Max. :69.70 Max. :2660.0
## NA's :2357
Comments:
Comments:
Comments: In this part we should go deeper on the variables that have higher correlation, doing paired scatter plot and cor.test
## [1] "Chi-Square correlation for gender vs mortality"
## Y
## X Alive Death
## F 1992 404
## M 3416 565
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: stu_data
## X-squared = 8.0629, df = 1, p-value = 0.004518
## [1] "Chi-Squqre correkation for ethnicity vs mortality"
## Y
## X Alive Death
## AMERICAN INDIAN/ALASKA NATIVE 9 1
## ASIAN 41 10
## ASIAN - ASIAN INDIAN 16 1
## ASIAN - CHINESE 56 8
## ASIAN - KOREAN 3 0
## ASIAN - SOUTH EAST ASIAN 12 4
## BLACK/AFRICAN 11 6
## BLACK/AFRICAN AMERICAN 339 89
## BLACK/CAPE VERDEAN 16 6
## BLACK/CARIBBEAN ISLAND 24 6
## HISPANIC OR LATINO 16 3
## HISPANIC/LATINO - CENTRAL AMERICAN 1 0
## HISPANIC/LATINO - COLUMBIAN 7 0
## HISPANIC/LATINO - CUBAN 4 0
## HISPANIC/LATINO - DOMINICAN 34 6
## HISPANIC/LATINO - GUATEMALAN 6 0
## HISPANIC/LATINO - HONDURAN 6 1
## HISPANIC/LATINO - MEXICAN 2 0
## HISPANIC/LATINO - PUERTO RICAN 61 16
## HISPANIC/LATINO - SALVADORAN 1 1
## MULTIPLE RACE/ETHNICITY 1 0
## NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER 10 0
## OTHER 148 32
## PATIENT DECLINED TO ANSWER 37 2
## PORTUGUESE 12 5
## SOUTH AMERICAN 3 2
## UNABLE TO OBTAIN 42 8
## UNKNOWN 553 140
## WHITE 3687 576
## WHITE - BRAZILIAN 8 0
## WHITE - EASTERN EUROPEAN 17 4
## WHITE - OTHER EUROPEAN 149 16
## WHITE - RUSSIAN 76 26
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 78.166, df = 32, p-value = 9.777e-06
## [1] "Chi-Squqre correkation for GCS...Eye.Opening vs mortality"
## Y
## X Alive Death
## 1 6 66
## 2 5 42
## 3 17 36
## 4 5380 825
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 659.09, df = 3, p-value < 2.2e-16
## [1] "Chi-Squqre correkation for Age.Group vs mortality"
## Y
## X Alive Death
## 19-35 90 9
## 36-50 382 46
## 51-65 1580 183
## 66-100 3356 731
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 64.117, df = 3, p-value = 7.749e-14
## [1] "Chi-Squqre correkation for ethnicity vs gender"
## Y
## X F M
## AMERICAN INDIAN/ALASKA NATIVE 5 5
## ASIAN 17 34
## ASIAN - ASIAN INDIAN 3 14
## ASIAN - CHINESE 29 35
## ASIAN - KOREAN 1 2
## ASIAN - SOUTH EAST ASIAN 6 10
## BLACK/AFRICAN 9 8
## BLACK/AFRICAN AMERICAN 220 208
## BLACK/CAPE VERDEAN 8 14
## BLACK/CARIBBEAN ISLAND 18 12
## HISPANIC OR LATINO 7 12
## HISPANIC/LATINO - CENTRAL AMERICAN 0 1
## HISPANIC/LATINO - COLUMBIAN 4 3
## HISPANIC/LATINO - CUBAN 1 3
## HISPANIC/LATINO - DOMINICAN 19 21
## HISPANIC/LATINO - GUATEMALAN 2 4
## HISPANIC/LATINO - HONDURAN 2 5
## HISPANIC/LATINO - MEXICAN 1 1
## HISPANIC/LATINO - PUERTO RICAN 23 54
## HISPANIC/LATINO - SALVADORAN 0 2
## MULTIPLE RACE/ETHNICITY 0 1
## NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER 5 5
## OTHER 66 114
## PATIENT DECLINED TO ANSWER 9 30
## PORTUGUESE 5 12
## SOUTH AMERICAN 2 3
## UNABLE TO OBTAIN 28 22
## UNKNOWN 229 464
## WHITE 1566 2697
## WHITE - BRAZILIAN 2 6
## WHITE - EASTERN EUROPEAN 13 8
## WHITE - OTHER EUROPEAN 59 106
## WHITE - RUSSIAN 37 65
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 81.94, df = 32, p-value = 2.93e-06
## [1] "Chi-Squqre correkation for GCS...Eye.Opening vs gender"
## Y
## X F M
## 1 26 46
## 2 21 26
## 3 30 23
## 4 2319 3886
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 9.3672, df = 3, p-value = 0.02479
## [1] "Chi-Squqre correkation for Age.Group vs gender"
## Y
## X F M
## 19-35 36 63
## 36-50 130 298
## 51-65 531 1232
## 66-100 1699 2388
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 79.129, df = 3, p-value < 2.2e-16
## [1] "Chi-Squqre correkation for ethnicity vs GCS...Eye.Opening"
## Y
## X 1 2 3 4
## AMERICAN INDIAN/ALASKA NATIVE 0 0 0 10
## ASIAN 3 0 1 47
## ASIAN - ASIAN INDIAN 0 0 0 17
## ASIAN - CHINESE 1 0 1 62
## ASIAN - KOREAN 0 0 0 3
## ASIAN - SOUTH EAST ASIAN 0 1 0 15
## BLACK/AFRICAN 0 0 0 17
## BLACK/AFRICAN AMERICAN 3 4 3 418
## BLACK/CAPE VERDEAN 1 0 0 21
## BLACK/CARIBBEAN ISLAND 1 0 1 28
## HISPANIC OR LATINO 0 1 0 18
## HISPANIC/LATINO - CENTRAL AMERICAN 0 0 0 1
## HISPANIC/LATINO - COLUMBIAN 0 0 0 7
## HISPANIC/LATINO - CUBAN 0 0 0 4
## HISPANIC/LATINO - DOMINICAN 0 0 0 40
## HISPANIC/LATINO - GUATEMALAN 0 0 0 6
## HISPANIC/LATINO - HONDURAN 0 0 0 7
## HISPANIC/LATINO - MEXICAN 0 0 0 2
## HISPANIC/LATINO - PUERTO RICAN 3 0 0 74
## HISPANIC/LATINO - SALVADORAN 0 0 0 2
## MULTIPLE RACE/ETHNICITY 0 0 0 1
## NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER 0 0 0 10
## OTHER 3 1 3 173
## PATIENT DECLINED TO ANSWER 0 0 0 39
## PORTUGUESE 0 0 0 17
## SOUTH AMERICAN 0 0 0 5
## UNABLE TO OBTAIN 0 1 1 48
## UNKNOWN 27 16 11 639
## WHITE 27 22 30 4184
## WHITE - BRAZILIAN 0 0 0 8
## WHITE - EASTERN EUROPEAN 0 0 0 21
## WHITE - OTHER EUROPEAN 1 0 2 162
## WHITE - RUSSIAN 2 1 0 99
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 143.13, df = 96, p-value = 0.001304
## [1] "Chi-Squqre correkation for ethnicity vs Age.Group"
## Y
## X 19-35 36-50 51-65 66-100
## AMERICAN INDIAN/ALASKA NATIVE 0 0 3 7
## ASIAN 2 4 14 31
## ASIAN - ASIAN INDIAN 0 1 6 10
## ASIAN - CHINESE 0 5 14 45
## ASIAN - KOREAN 0 0 0 3
## ASIAN - SOUTH EAST ASIAN 1 1 4 10
## BLACK/AFRICAN 1 3 1 12
## BLACK/AFRICAN AMERICAN 20 41 135 232
## BLACK/CAPE VERDEAN 1 1 6 14
## BLACK/CARIBBEAN ISLAND 2 2 10 16
## HISPANIC OR LATINO 0 0 10 9
## HISPANIC/LATINO - CENTRAL AMERICAN 0 0 0 1
## HISPANIC/LATINO - COLUMBIAN 0 0 1 6
## HISPANIC/LATINO - CUBAN 0 0 1 3
## HISPANIC/LATINO - DOMINICAN 3 4 15 18
## HISPANIC/LATINO - GUATEMALAN 0 2 3 1
## HISPANIC/LATINO - HONDURAN 0 0 4 3
## HISPANIC/LATINO - MEXICAN 0 0 1 1
## HISPANIC/LATINO - PUERTO RICAN 2 9 34 32
## HISPANIC/LATINO - SALVADORAN 0 0 1 1
## MULTIPLE RACE/ETHNICITY 1 0 0 0
## NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER 0 1 2 7
## OTHER 5 16 58 101
## PATIENT DECLINED TO ANSWER 0 5 13 21
## PORTUGUESE 0 1 3 13
## SOUTH AMERICAN 0 0 1 4
## UNABLE TO OBTAIN 0 5 13 32
## UNKNOWN 12 43 173 465
## WHITE 44 272 1164 2783
## WHITE - BRAZILIAN 0 2 4 2
## WHITE - EASTERN EUROPEAN 0 2 5 14
## WHITE - OTHER EUROPEAN 4 8 49 104
## WHITE - RUSSIAN 1 0 15 86
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 240.14, df = 96, p-value = 2.441e-14
## [1] "Chi-Squqre correkation for ethnicity vs Age.Group"
## Y
## X 19-35 36-50 51-65 66-100
## 1 0 7 15 50
## 2 0 2 11 34
## 3 2 2 10 39
## 4 97 417 1727 3964
## Warning in chisq.test(stu_data): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: stu_data
## X-squared = 10.291, df = 9, p-value = 0.3274
In this part we will be taking a careful look to the ranges and limits for each variable, we will group them by the nature of the variables and their clinical significance.
## subject_id gender age mortality
## Min. :10002430 F:2396 Min. :19.00 Alive:5408
## 1st Qu.:12494493 M:3981 1st Qu.:61.00 Death: 969
## Median :14959313 Median :70.00
## Mean :14975796 Mean :69.31
## 3rd Qu.:17439137 3rd Qu.:79.00
## Max. :19997293 Max. :91.00
##
## ethnicity
## WHITE :4263
## UNKNOWN : 693
## BLACK/AFRICAN AMERICAN: 428
## OTHER : 180
## WHITE - OTHER EUROPEAN: 165
## WHITE - RUSSIAN : 102
## (Other) : 546
Comments: In this identification part, it seems to be good and clean, it will serve further to split the data and make other kind of analysis.
## Heart.Rate Heart.rate.Alarm...High Heart.Rate.Alarm...Low
## Min. : 43.0 Min. : 60.0 Min. : 40.0
## 1st Qu.: 95.0 1st Qu.: 120.0 1st Qu.: 50.0
## Median :109.0 Median : 130.0 Median : 60.0
## Mean :113.1 Mean : 260.8 Mean : 168.8
## 3rd Qu.:128.0 3rd Qu.: 133.8 3rd Qu.: 60.0
## Max. :295.0 Max. :165130.0 Max. :60120.0
## NA's :3 NA's :4
Now, we’ll be checking the number of values out-of-range among Heart Rate
## [1] "Out-of-range values Heart Rate Group"
## Heart Rate: 6
## Heart Rate Alarm High: 65
## Heart Rate Alarm Low: 27
In this part, we will erase all values above the normal heart rate.
## Summary of cleaned Heart Rate Group
## Heart.Rate Heart.rate.Alarm...High Heart.Rate.Alarm...Low
## Min. : 43 Min. : 60.0 Min. : 40.00
## 1st Qu.: 95 1st Qu.:120.0 1st Qu.: 50.00
## Median :109 Median :130.0 Median : 60.00
## Mean :113 Mean :131.1 Mean : 60.06
## 3rd Qu.:128 3rd Qu.:130.0 3rd Qu.: 60.00
## Max. :242 Max. :250.0 Max. :180.00
## NA's :6 NA's :68 NA's :31
Comments:
References: https://www.heart.org/en/healthy-living/fitness/fitness-basics/target-heart-rates https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6220689/
## Arterial.Blood.Pressure.systolic Non.Invasive.Blood.Pressure.systolic
## Min. : 0.0 Min. : 56.0
## 1st Qu.:136.0 1st Qu.: 135.0
## Median :150.0 Median : 153.0
## Mean :154.5 Mean : 160.8
## 3rd Qu.:168.0 3rd Qu.: 171.0
## Max. :742.0 Max. :15878.0
## NA's :2478 NA's :45
## Arterial.Blood.Pressure.diastolic Non.Invasive.Blood.Pressure.diastolic
## Min. : 1.0 Min. : 41.0
## 1st Qu.: 69.0 1st Qu.: 82.0
## Median : 78.0 Median : 98.0
## Mean : 174.7 Mean : 215.9
## 3rd Qu.: 92.0 3rd Qu.: 115.0
## Max. :91100.0 Max. :105125.0
## NA's :2477 NA's :45
Now, we’ll be checking the number of values out-of-range among Blood Pressure
## [1] "Out-of-range values Blood Pressure Group"
## Arterial BP syst: 1
## Non Inv BP syst: 4
## Arterial BP dias: 32
## Non Inv BP dias: 34
Comments: The limits for each variable were taken graphically analyzing the scatter plot, we use extreme values to keep the outliers that could potentially contain valuable information about patient condition.
## Summary of cleaned Blood Pressure Group
## Arterial.Blood.Pressure.systolic Non.Invasive.Blood.Pressure.systolic
## Min. : 0.0 Min. : 56.0
## 1st Qu.:136.0 1st Qu.:135.0
## Median :149.5 Median :153.0
## Mean :154.4 Mean :154.3
## 3rd Qu.:167.8 3rd Qu.:171.0
## Max. :357.0 Max. :321.0
## NA's :2479 NA's :49
## Arterial.Blood.Pressure.diastolic Non.Invasive.Blood.Pressure.diastolic
## Min. : 1.00 Min. : 41.0
## 1st Qu.: 69.00 1st Qu.: 82.0
## Median : 78.00 Median : 97.5
## Mean : 83.43 Mean :100.0
## 3rd Qu.: 92.00 3rd Qu.:115.0
## Max. :259.00 Max. :230.0
## NA's :2509 NA's :79
References: https://www.nursingcenter.com/ncblog/may-2022/non-invasive-blood-pressure#:~:text=Normal%20blood%20pressure%20is%20considered,can%20lead%20to%20inaccurate%20readings.
## Respiratory.Rate Respiratory.Rate..Set. Respiratory.Rate..spontaneous.
## Min. : 15.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 27.00 1st Qu.: 16.00 1st Qu.: 13.00
## Median : 32.00 Median : 18.00 Median : 21.00
## Mean : 34.78 Mean : 20.02 Mean : 20.81
## 3rd Qu.: 38.00 3rd Qu.: 20.00 3rd Qu.: 28.00
## Max. :2037.00 Max. :1618.00 Max. :1918.00
## NA's :2 NA's :2470 NA's :2300
## Respiratory.Rate..Total. SpO2.Desat.Limit
## Min. : 0.00 Min. : 85.00
## 1st Qu.: 18.00 1st Qu.: 85.00
## Median : 23.00 Median : 88.00
## Mean : 25.74 Mean : 89.38
## 3rd Qu.: 29.00 3rd Qu.: 88.00
## Max. :3634.00 Max. :920.00
## NA's :2286 NA's :13
Now, we’ll be checking the number of values out-of-range among Respiratory Rate
## [1] "Out-of-range values Respiration Group"
## Respiratory rate: 25
## Respiratory Rate (Set): 11
## Respiratory Rate (spontaneous): 11
## Respiratory Rate (Total): 7
## SpO2 Desat Limit: 18
Comments: - A respiratory rate of 120 breaths per minute (bpm) would be extremely high and generally not sustainable for an extended period in a resting adult. Such a high respiratory rate would likely indicate severe respiratory distress, significant metabolic demand, or a medical emergency. While it’s theoretically possible for a person to briefly reach such a high respiratory rate, it would be highly abnormal and would warrant immediate medical attention. by ChatGPT
## Summary of cleaned Respiration Group
## Respiratory.Rate Respiratory.Rate..Set. Respiratory.Rate..spontaneous.
## Min. : 15.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 27.00 1st Qu.:16.00 1st Qu.: 13.00
## Median : 32.00 Median :18.00 Median : 21.00
## Mean : 33.54 Mean :18.55 Mean : 19.71
## 3rd Qu.: 37.00 3rd Qu.:20.00 3rd Qu.: 28.00
## Max. :120.00 Max. :40.00 Max. :101.00
## NA's :27 NA's :2481 NA's :2311
## Respiratory.Rate..Total. SpO2.Desat.Limit
## Min. : 0.00 Min. : 85.00
## 1st Qu.: 18.00 1st Qu.: 85.00
## Median : 23.00 Median : 88.00
## Mean : 24.55 Mean : 87.32
## 3rd Qu.: 29.00 3rd Qu.: 88.00
## Max. :118.00 Max. :100.00
## NA's :2293 NA's :31
References: https://www.whoop.com/us/en/thelocker/what-causes-an-increased-respiratory-rate/
## INR Prothrombin.time
## Min. : 0.9 Min. : 9.3
## 1st Qu.: 1.2 1st Qu.: 13.5
## Median : 1.4 Median : 15.6
## Mean : 5049.9 Mean : 4905.8
## 3rd Qu.: 1.8 3rd Qu.: 19.9
## Max. :999999.0 Max. :999999.0
## NA's :236 NA's :236
Now, we’ll be checking the number of values out-of-range among Blood Clotting
## [1] "Out-of-range values Blood Clotting Group"
## Error Value (999999) INR PT: 61
## INR: 716
## Prothrombin time: 1432
Comments: - For both variables we found a strange value ‘999999’, this values maybe is due a misreading or malfunction of the device, later on we can decide what to do with it. - For the INR I used a value found in the reference as maximum range, however, that values is for patient in certain treatment, we should research a little bit more if we found some other references. - For the Prothrombin time, didn’t found valuable resources or values, I let the 3rd quartile as reference just to see the outcome.
## INR Prothrombin.time
## Min. : 0.900 Min. : 9.30
## 1st Qu.: 1.200 1st Qu.: 13.50
## Median : 1.400 Median : 15.60
## Mean : 1.923 Mean : 20.71
## 3rd Qu.: 1.800 3rd Qu.: 19.70
## Max. :27.400 Max. :150.00
## NA's :267 NA's :266
Comments:
Reference: https://www.mayoclinic.org/tests-procedures/prothrombin-time/about/pac-20384661#:~:text=The%20average%20time%20range%20for,clots%20more%20quickly%20than%20normal.
## Anion.gap Potassium..Whole.Blood.2 Potassium..whole.blood.
## Min. : 7.0 Min. : 1.800 Min. : 2.1
## 1st Qu.: 14.0 1st Qu.: 4.400 1st Qu.: 4.3
## Median : 16.0 Median : 5.000 Median : 4.9
## Mean : 174.5 Mean : 5.124 Mean : 1403.6
## 3rd Qu.: 20.0 3rd Qu.: 5.600 3rd Qu.: 5.5
## Max. :999999.0 Max. :134.000 Max. :999999.0
## NA's :17 NA's :1441 NA's :2802
## Sodium..whole.blood. Sodium..Whole.Blood Chloride..Whole.Blood
## Min. : 118.0 Min. :115.0 Min. : 71.0
## 1st Qu.: 135.0 1st Qu.:136.0 1st Qu.:103.0
## Median : 137.0 Median :138.0 Median :106.0
## Mean : 800.3 Mean :138.3 Mean :106.1
## 3rd Qu.: 139.0 3rd Qu.:140.0 3rd Qu.:109.0
## Max. :999999.0 Max. :187.0 Max. :139.0
## NA's :3362 NA's :2121 NA's :2267
## Chloride..whole.blood. Bicarbonate
## Min. : 11.0 Min. :13.0
## 1st Qu.: 104.0 1st Qu.:28.0
## Median : 107.0 Median :30.0
## Mean : 462.7 Mean :30.7
## 3rd Qu.: 109.0 3rd Qu.:33.0
## Max. :999999.0 Max. :51.0
## NA's :3569 NA's :3
Now, we’ll be checking the number of values out-of-range among “x1”
## [1] "Out-of-range values Electrolytes and Acid-Base Balance Group"
## Error Value (999999) in: Anion.gap = 1
## Error Value (999999) in: Potassium..Whole.Blood.2 = 0
## Error Value (999999) in: Potassium..whole.blood. = 5
## Error Value (999999) in: Sodium..whole.blood. = 2
## Error Value (999999) in: Sodium..Whole.Blood = 0
## Error Value (999999) in: Chloride..Whole.Blood = 0
## Error Value (999999) in: Chloride..whole.blood. = 1
## Error Value (999999) in: Bicarbonate = 0
Comments:
## Anion.gap Potassium..Whole.Blood.2 Potassium..whole.blood.
## Min. : 7.00 Min. : 1.800 Min. : 2.100
## 1st Qu.: 14.00 1st Qu.: 4.400 1st Qu.: 4.300
## Median : 16.00 Median : 5.000 Median : 4.900
## Mean : 17.31 Mean : 5.124 Mean : 5.006
## 3rd Qu.: 20.00 3rd Qu.: 5.600 3rd Qu.: 5.500
## Max. :157.00 Max. :134.000 Max. :134.000
## NA's :18 NA's :1441 NA's :2807
## Sodium..whole.blood. Sodium..Whole.Blood Chloride..Whole.Blood
## Min. :118.0 Min. :115.0 Min. : 71.0
## 1st Qu.:135.0 1st Qu.:136.0 1st Qu.:103.0
## Median :137.0 Median :138.0 Median :106.0
## Mean :137.1 Mean :138.3 Mean :106.1
## 3rd Qu.:139.0 3rd Qu.:140.0 3rd Qu.:109.0
## Max. :187.0 Max. :187.0 Max. :139.0
## NA's :3364 NA's :2121 NA's :2267
## Chloride..whole.blood. Bicarbonate
## Min. : 11.0 Min. :13.0
## 1st Qu.:104.0 1st Qu.:28.0
## Median :107.0 Median :30.0
## Mean :106.6 Mean :30.7
## 3rd Qu.:109.0 3rd Qu.:33.0
## Max. :141.0 Max. :51.0
## NA's :3570 NA's :3
## [1] "Out-of-range values Electrolytes and Acid-Base Balance Group"
## Anion Gap: 1
## Potassium (Whole Blood 1): 1
## Potassium (Whole Blood 2): 1
## Chloride (Whole Blood 2): 1
Comments:
## Anion.gap Potassium..Whole.Blood.2 Potassium..whole.blood.
## Min. : 7.00 Min. : 1.800 Min. : 2.10
## 1st Qu.:14.00 1st Qu.: 4.400 1st Qu.: 4.30
## Median :16.00 Median : 5.000 Median : 4.90
## Mean :17.29 Mean : 5.098 Mean : 4.97
## 3rd Qu.:20.00 3rd Qu.: 5.600 3rd Qu.: 5.50
## Max. :56.00 Max. :33.000 Max. :33.00
## NA's :19 NA's :1442 NA's :2808
## Sodium..whole.blood. Sodium..Whole.Blood Chloride..Whole.Blood
## Min. :118.0 Min. :115.0 Min. : 71.0
## 1st Qu.:135.0 1st Qu.:136.0 1st Qu.:103.0
## Median :137.0 Median :138.0 Median :106.0
## Mean :137.1 Mean :138.3 Mean :106.1
## 3rd Qu.:139.0 3rd Qu.:140.0 3rd Qu.:109.0
## Max. :187.0 Max. :187.0 Max. :139.0
## NA's :3364 NA's :2121 NA's :2267
## Chloride..whole.blood. Bicarbonate
## Min. : 71.0 Min. :13.0
## 1st Qu.:104.0 1st Qu.:28.0
## Median :107.0 Median :30.0
## Mean :106.7 Mean :30.7
## 3rd Qu.:109.0 3rd Qu.:33.0
## Max. :141.0 Max. :51.0
## NA's :3571 NA's :3
Comments:
## Creatinine..serum. Temperature Glucose..whole.blood.
## Min. : 0.3 Min. :32.20 Min. : 35
## 1st Qu.: 0.9 1st Qu.:36.80 1st Qu.: 147
## Median : 1.2 Median :37.20 Median : 173
## Mean : 944.9 Mean :37.39 Mean : 2312
## 3rd Qu.: 2.1 3rd Qu.:37.90 3rd Qu.: 210
## Max. :999999.0 Max. :40.70 Max. :1276100
## NA's :14 NA's :3840 NA's :2953
## [1] "Out-of-range values for Creatinine (serum), Glucose (Whole Blood), and Temperature"
## Error Value (999999) Creatinine Serum: 6
## Creatinine (serum): 7
## Glucose (Whole Blood): 7
## Creatinine..serum. Temperature Glucose..whole.blood.
## Min. : 0.300 Min. :32.20 Min. : 35.0
## 1st Qu.: 0.900 1st Qu.:36.80 1st Qu.: 147.0
## Median : 1.200 Median :37.20 Median : 173.0
## Mean : 1.969 Mean :37.39 Mean : 187.2
## 3rd Qu.: 2.100 3rd Qu.:37.90 3rd Qu.: 210.0
## Max. :23.000 Max. :40.70 Max. :1183.0
## NA's :21 NA's :3840 NA's :2960
Comments:
## Hemoglobin Hemoglobin.2 Hematocrit Platelet.Count
## Min. : 0.00 Min. : 5.10 Min. :18.10 Min. : 9.0
## 1st Qu.:10.70 1st Qu.:12.30 1st Qu.:37.60 1st Qu.: 231.0
## Median :12.10 Median :13.60 Median :41.40 Median : 304.0
## Mean :12.02 Mean :13.52 Mean :41.14 Mean : 339.9
## 3rd Qu.:13.40 3rd Qu.:14.80 3rd Qu.:44.60 3rd Qu.: 408.0
## Max. :97.00 Max. :22.60 Max. :69.70 Max. :2660.0
## NA's :2357
## [1] "Out-of-range values for Hemoglobin, Hemoglobin.2, Hematocrit, and Platelet.Count"
## Hemoglobin: 2
## Platelet Count: 11
## Summary of cleaned variables
## Hemoglobin Hemoglobin.2 Hematocrit Platelet.Count
## Min. : 0.00 Min. : 5.10 Min. :18.10 Min. : 9.0
## 1st Qu.:10.70 1st Qu.:12.30 1st Qu.:37.60 1st Qu.: 231.0
## Median :12.10 Median :13.60 Median :41.40 Median : 304.0
## Mean :11.98 Mean :13.52 Mean :41.14 Mean : 337.3
## 3rd Qu.:13.40 3rd Qu.:14.80 3rd Qu.:44.60 3rd Qu.: 407.0
## Max. :19.40 Max. :22.60 Max. :69.70 Max. :1328.0
## NA's :2359 NA's :11
Comments:
References: https://www.nhlbi.nih.gov/health/thrombocytopenia#:~:text=A%20normal%20platelet%20count%20in,microliter%20is%20lower%20than%20normal.
## The number of total missing values after ranges cleaning is: 37683
## The percentage of mortality is: 0.152 --> In the dataset with ranges cleaned.
## [1] "subject_id"
## [2] "gender"
## [3] "age"
## [4] "mortality"
## [5] "ethnicity"
## [6] "Heart.Rate"
## [7] "Heart.rate.Alarm...High"
## [8] "Heart.Rate.Alarm...Low"
## [9] "Arterial.Blood.Pressure.systolic"
## [10] "Non.Invasive.Blood.Pressure.systolic"
## [11] "Arterial.Blood.Pressure.diastolic"
## [12] "Non.Invasive.Blood.Pressure.diastolic"
## [13] "Respiratory.Rate"
## [14] "Respiratory.Rate..Set."
## [15] "Respiratory.Rate..spontaneous."
## [16] "Respiratory.Rate..Total."
## [17] "SpO2.Desat.Limit"
## [18] "INR"
## [19] "Prothrombin.time"
## [20] "Anion.gap"
## [21] "Creatinine..serum."
## [22] "Temperature"
## [23] "Potassium..Whole.Blood.2"
## [24] "Potassium..whole.blood."
## [25] "Sodium..whole.blood."
## [26] "Sodium..Whole.Blood"
## [27] "Chloride..Whole.Blood"
## [28] "Chloride..whole.blood."
## [29] "Bicarbonate"
## [30] "Glucose..whole.blood."
## [31] "GCS...Eye.Opening"
## [32] "Hemoglobin"
## [33] "Hemoglobin.2"
## [34] "Hematocrit"
## [35] "Platelet.Count"
## [36] "Acute.myocardial.infarction.of.anterolateral.wall..episode.of.c"
## [37] "Acute.myocardial.infarction.of.anterolateral.wall..initial.epis"
## [38] "Acute.myocardial.infarction.of.anterolateral.wall..subsequent.e"
## [39] "Acute.myocardial.infarction.of.other.anterior.wall..episode.of."
## [40] "Acute.myocardial.infarction.of.other.anterior.wall..initial.epi"
## [41] "Acute.myocardial.infarction.of.other.anterior.wall..subsequent."
## [42] "Acute.myocardial.infarction.of.inferolateral.wall..episode.of.c"
## [43] "Acute.myocardial.infarction.of.inferolateral.wall..initial.epis"
## [44] "Acute.myocardial.infarction.of.inferolateral.wall..subsequent.e"
## [45] "Acute.myocardial.infarction.of.inferoposterior.wall..episode.of"
## [46] "Acute.myocardial.infarction.of.inferoposterior.wall..initial.ep"
## [47] "Acute.myocardial.infarction.of.inferoposterior.wall..subsequent"
## [48] "Acute.myocardial.infarction.of.other.inferior.wall..episode.of."
## [49] "Acute.myocardial.infarction.of.other.inferior.wall..initial.epi"
## [50] "Acute.myocardial.infarction.of.other.inferior.wall..subsequent."
## [51] "Acute.myocardial.infarction.of.other.lateral.wall..episode.of.c"
## [52] "Acute.myocardial.infarction.of.other.lateral.wall..initial.epis"
## [53] "Acute.myocardial.infarction.of.other.lateral.wall..subsequent.e"
## [54] "Acute.myocardial.infarction.of.other.specified.sites..episode.o"
## [55] "Acute.myocardial.infarction.of.other.specified.sites..initial.e"
## [56] "Acute.myocardial.infarction.of.other.specified.sites..subsequen"
## [57] "Acute.myocardial.infarction.of.unspecified.site..episode.of.car"
## [58] "Acute.myocardial.infarction.of.unspecified.site..initial.episod"
## [59] "Acute.myocardial.infarction.of.unspecified.site..subsequent.epi"
## [60] "Postmyocardial.infarction.syndrome"
## [61] "Acute.coronary.occlusion.without.myocardial.infarction"
## [62] "Old.myocardial.infarction"
## [63] "Certain.sequelae.of.myocardial.infarction..not.elsewhere.classi"
## [64] "Acute.myocardial.infarction"
## [65] "ST.elevation..STEMI..myocardial.infarction.of.anterior.wall"
## [66] "ST.elevation..STEMI..myocardial.infarction.involving.left.main"
## [67] "ST.elevation..STEMI..myocardial.infarction.involving.left.anter"
## [68] "ST.elevation..STEMI..myocardial.infarction.involving.other.coro"
## [69] "ST.elevation..STEMI..myocardial.infarction.of.inferior.wall"
## [70] "ST.elevation..STEMI..myocardial.infarction.involving.right.coro"
## [71] "ST.elevation..STEMI..myocardial.infarction.involving.other.coro.2"
## [72] "ST.elevation..STEMI..myocardial.infarction.of.other.sites"
## [73] "ST.elevation..STEMI..myocardial.infarction.involving.left.circu"
## [74] "ST.elevation..STEMI..myocardial.infarction.involving.other.site"
## [75] "ST.elevation..STEMI..myocardial.infarction.of.unspecified.site"
## [76] "Non.ST.elevation..NSTEMI..myocardial.infarction"
## [77] "Acute.myocardial.infarction..unspecified"
## [78] "Other.type.of.myocardial.infarction"
## [79] "Myocardial.infarction.type.2"
## [80] "Other.myocardial.infarction.type"
## [81] "Subsequent.ST.elevation..STEMI..and.non.ST.elevation..NSTEMI..m"
## [82] "Subsequent.ST.elevation..STEMI..myocardial.infarction.of.anteri"
## [83] "Subsequent.ST.elevation..STEMI..myocardial.infarction.of.inferi"
## [84] "Subsequent.non.ST.elevation..NSTEMI..myocardial.infarction"
## [85] "Subsequent.ST.elevation..STEMI..myocardial.infarction.of.other"
## [86] "Subsequent.ST.elevation..STEMI..myocardial.infarction.of.unspec"
## [87] "Certain.current.complications.following.ST.elevation..STEMI..an"
## [88] "Hemopericardium.as.current.complication.following.acute.myocard"
## [89] "Atrial.septal.defect.as.current.complication.following.acute.my"
## [90] "Ventricular.septal.defect.as.current.complication.following.acu"
## [91] "Rupture.of.cardiac.wall.without.hemopericardium.as.current.comp"
## [92] "Rupture.of.chordae.tendineae.as.current.complication.following"
## [93] "Rupture.of.papillary.muscle.as.current.complication.following.a"
## [94] "Thrombosis.of.atrium..auricular.appendage..and.ventricle.as.cur"
## [95] "Other.current.complications.following.acute.myocardial.infarcti"
## [96] "Acute.coronary.thrombosis.not.resulting.in.myocardial.infarctio"
## [97] "Old.myocardial.infarction.2"
## [98] "Rheumatic.heart.failure..congestive."
## [99] "Congestive.heart.failure..unspecified"
## [100] "Systolic..congestive..heart.failure"
## [101] "Unspecified.systolic..congestive..heart.failure"
## [102] "Acute.systolic..congestive..heart.failure"
## [103] "Chronic.systolic..congestive..heart.failure"
## [104] "Acute.on.chronic.systolic..congestive..heart.failure"
## [105] "Diastolic..congestive..heart.failure"
## [106] "Unspecified.diastolic..congestive..heart.failure"
## [107] "Acute.diastolic..congestive..heart.failure"
## [108] "Chronic.diastolic..congestive..heart.failure"
## [109] "Acute.on.chronic.diastolic..congestive..heart.failure"
## [110] "Combined.systolic..congestive..and.diastolic..congestive..heart"
## [111] "Unspecified.combined.systolic..congestive..and.diastolic..conge"
## [112] "Acute.combined.systolic..congestive..and.diastolic..congestive."
## [113] "Chronic.combined.systolic..congestive..and.diastolic..congestiv"
## [114] "Acute.on.chronic.combined.systolic..congestive..and.diastolic.."
## [115] "Atrial.fibrillation"
## [116] "Atrial.fibrillation.and.flutter"
## [117] "Paroxysmal.atrial.fibrillation"
## [118] "Persistent.atrial.fibrillation"
## [119] "Longstanding.persistent.atrial.fibrillation"
## [120] "Other.persistent.atrial.fibrillation"
## [121] "Chronic.atrial.fibrillation"
## [122] "Chronic.atrial.fibrillation..unspecified"
## [123] "Permanent.atrial.fibrillation"
## [124] "Unspecified.atrial.fibrillation.and.atrial.flutter"
## [125] "Unspecified.atrial.fibrillation"
## [126] "Other.chronic.obstructive.pulmonary.disease"
## [127] "Chronic.obstructive.pulmonary.disease.with..acute..lower.respir"
## [128] "Chronic.obstructive.pulmonary.disease.with..acute..exacerbation"
## [129] "Chronic.obstructive.pulmonary.disease..unspecified"
## [130] "Heat.stroke.and.sunstroke"
## [131] "Brain.stem.stroke.syndrome"
## [132] "Cerebellar.stroke.syndrome"
## [133] "National.Institutes.of.Health.Stroke.Scale..NIHSS..score"
## [134] "Heatstroke.and.sunstroke"
## [135] "Heatstroke.and.sunstroke.2"
## [136] "Heatstroke.and.sunstroke..initial.encounter"
## [137] "Heatstroke.and.sunstroke..subsequent.encounter"
## [138] "Heatstroke.and.sunstroke..sequela"
## [139] "Exertional.heatstroke"
## [140] "Exertional.heatstroke..initial.encounter"
## [141] "Exertional.heatstroke..subsequent.encounter"
## [142] "Exertional.heatstroke..sequela"
## [143] "Other.heatstroke.and.sunstroke"
## [144] "Other.heatstroke.and.sunstroke..initial.encounter"
## [145] "Other.heatstroke.and.sunstroke..subsequent.encounter"
## [146] "Other.heatstroke.and.sunstroke..sequela"
## [147] "Heatstroke.and.sunstroke..initial.encounter.2"
## [148] "Heatstroke.and.sunstroke..subsequent.encounter.2"
## [149] "Heatstroke.and.sunstroke..sequela.2"
## [150] "Family.history.of.stroke..cerebrovascular."
## [151] "Family.history.of.stroke"
## [152] "Mixed.hyperlipidemia"
## [153] "Other.and.unspecified.hyperlipidemia"
## [154] "Mixed.hyperlipidemia.2"
## [155] "Other.hyperlipidemia"
## [156] "Other.hyperlipidemia.2"
## [157] "Hyperlipidemia..unspecified"
## [158] "Other.chronic.obstructive.pulmonary.disease.2"
## [159] "Chronic.obstructive.pulmonary.disease.with..acute..lower.respir.2"
## [160] "Chronic.obstructive.pulmonary.disease.with..acute..exacerbation.2"
## [161] "Chronic.obstructive.pulmonary.disease..unspecified.2"
## [162] "Senile.dementia..uncomplicated"
## [163] "Presenile.dementia..uncomplicated"
## [164] "Presenile.dementia.with.delirium"
## [165] "Presenile.dementia.with.delusional.features"
## [166] "Presenile.dementia.with.depressive.features"
## [167] "Senile.dementia.with.delusional.features"
## [168] "Senile.dementia.with.depressive.features"
## [169] "Senile.dementia.with.delirium"
## [170] "Vascular.dementia..uncomplicated"
## [171] "Vascular.dementia..with.delirium"
## [172] "Vascular.dementia..with.delusions"
## [173] "Vascular.dementia..with.depressed.mood"
## [174] "Alcohol.induced.persisting.dementia"
## [175] "Drug.induced.persisting.dementia"
## [176] "Dementia.in.conditions.classified.elsewhere.without.behavioral"
## [177] "Dementia.in.conditions.classified.elsewhere.with.behavioral.dis"
## [178] "Dementia..unspecified..without.behavioral.disturbance"
## [179] "Dementia..unspecified..with.behavioral.disturbance"
## [180] "Other.frontotemporal.dementia"
## [181] "Dementia.with.lewy.bodies"
## [182] "Vascular.dementia"
## [183] "Vascular.dementia.2"
## [184] "Vascular.dementia.without.behavioral.disturbance"
## [185] "Vascular.dementia.with.behavioral.disturbance"
## [186] "Dementia.in.other.diseases.classified.elsewhere"
## [187] "Dementia.in.other.diseases.classified.elsewhere.2"
## [188] "Dementia.in.other.diseases.classified.elsewhere.without.behavio"
## [189] "Dementia.in.other.diseases.classified.elsewhere.with.behavioral"
## [190] "Unspecified.dementia"
## [191] "Unspecified.dementia.2"
## [192] "Unspecified.dementia.without.behavioral.disturbance"
## [193] "Unspecified.dementia.with.behavioral.disturbance"
## [194] "Alcohol.dependence.with.alcohol.induced.persisting.dementia"
## [195] "Alcohol.use..unspecified.with.alcohol.induced.persisting.dement"
## [196] "Sedative..hypnotic.or.anxiolytic.dependence.with.sedative..hypn"
## [197] "Sedative..hypnotic.or.anxiolytic.use..unspecified.with.sedative"
## [198] "Inhalant.abuse.with.inhalant.induced.dementia"
## [199] "Inhalant.dependence.with.inhalant.induced.dementia"
## [200] "Inhalant.use..unspecified.with.inhalant.induced.persisting.deme"
## [201] "Other.psychoactive.substance.abuse.with.psychoactive.substance."
## [202] "Other.psychoactive.substance.dependence.with.psychoactive.subst"
## [203] "Other.psychoactive.substance.use..unspecified.with.psychoactive"
## [204] "Frontotemporal.dementia"
## [205] "Other.frontotemporal.dementia.2"
## [206] "Dementia.with.Lewy.bodies"
## [207] "Age.Group"
## [208] "Myocardial"
## [209] "Rupture"
## [210] "Thrombosis"
## [211] "Systolic"
## [212] "Diastolic"
## [213] "Comb_DS"
## [214] "Fibrillation"
## [215] "PulmonaryDisease"
## [216] "Stroke"
## [217] "Hyperlipidemia"
## [218] "Dementia"
## [1] 6377 47
## [1] "subject_id"
## [2] "gender"
## [3] "age"
## [4] "mortality"
## [5] "ethnicity"
## [6] "Heart.Rate"
## [7] "Heart.rate.Alarm...High"
## [8] "Heart.Rate.Alarm...Low"
## [9] "Arterial.Blood.Pressure.systolic"
## [10] "Non.Invasive.Blood.Pressure.systolic"
## [11] "Arterial.Blood.Pressure.diastolic"
## [12] "Non.Invasive.Blood.Pressure.diastolic"
## [13] "Respiratory.Rate"
## [14] "Respiratory.Rate..Set."
## [15] "Respiratory.Rate..spontaneous."
## [16] "Respiratory.Rate..Total."
## [17] "SpO2.Desat.Limit"
## [18] "INR"
## [19] "Prothrombin.time"
## [20] "Anion.gap"
## [21] "Creatinine..serum."
## [22] "Temperature"
## [23] "Potassium..Whole.Blood.2"
## [24] "Potassium..whole.blood."
## [25] "Sodium..whole.blood."
## [26] "Sodium..Whole.Blood"
## [27] "Chloride..Whole.Blood"
## [28] "Chloride..whole.blood."
## [29] "Bicarbonate"
## [30] "Glucose..whole.blood."
## [31] "GCS...Eye.Opening"
## [32] "Hemoglobin"
## [33] "Hemoglobin.2"
## [34] "Hematocrit"
## [35] "Platelet.Count"
## [36] "Age.Group"
## [37] "Myocardial"
## [38] "Rupture"
## [39] "Thrombosis"
## [40] "Systolic"
## [41] "Diastolic"
## [42] "Comb_DS"
## [43] "Fibrillation"
## [44] "PulmonaryDisease"
## [45] "Stroke"
## [46] "Hyperlipidemia"
## [47] "Dementia"
With a correlation of approximately 0.08325, the Arterial.Blood.Pressure.systolic and Non.Invasive.Blood.Pressure.systolic variables have a very weak positive correlation. Which means they are not strongly related, and consolidating them might not provide much benefit in terms of improving the data quality. But these variables are to measure systolic blood pressure using different methods, so we are gonna create a new variable, averaging the two variables.
## [1] TRUE
This code calculates the row-wise mean of the two blood pressure variables and creates a new variable AvgBloodPressureSystolic in the dataset.
With a correlation of approximately -0.0009740, the Arterial.Blood.Pressure.diastolic and Non.Invasive.Blood.Pressure.diastolic variables have a very weak negative correlation. This suggests that the two variables are not correlated or are very weakly related. In this case, similar to the systolic blood pressure variables, consolidating these variables might not provide much benefit in terms of improving data quality or reducing redundancy. But these variables are to measure diastolic blood pressure using different methods, so we are gonna create a new variable, averaging the two variables.
## [1] TRUE
This code calculates the row-wise mean of the two blood pressure variables and creates a new variable AvgBloodPressureDiastolic in the dataset.
| Respiratory.Rate..Set. | Respiratory.Rate..spontaneous. | Respiratory.Rate..Total. | Respiratory.Rate | |
|---|---|---|---|---|
| Respiratory.Rate..Set. | 1.0000000 | 0.2653949 | 0.5113597 | 0.3130305 |
| Respiratory.Rate..spontaneous. | 0.2653949 | 1.0000000 | 0.6906674 | 0.3578672 |
| Respiratory.Rate..Total. | 0.5113597 | 0.6906674 | 1.0000000 | 0.3994472 |
| Respiratory.Rate | 0.3130305 | 0.3578672 | 0.3994472 | 1.0000000 |
Even though they are not strongly correlated, we can still consolidate them into a one variable, because all of these variables are different ways of measuring or recording same respiratory rate.
## [1] TRUE
This code calculates the row-wise mean of the four respiratory rate variables, creating a new variable ConsolidatedRespiratoryRate.
## [1] TRUE
| missing_count | missing_percentage | |
|---|---|---|
| Temperature | 3840 | 60.2164027 |
| Glucose..whole.blood. | 2960 | 46.4168104 |
| AvgChloride | 2265 | 35.5182688 |
| AvgSodium | 2119 | 33.2287910 |
| AvgPotassium | 1442 | 22.6125137 |
| INR | 267 | 4.1869218 |
| Prothrombin.time | 266 | 4.1712404 |
| Heart.rate.Alarm…High | 68 | 1.0663321 |
| Heart.Rate.Alarm…Low | 31 | 0.4861220 |
| SpO2.Desat.Limit | 31 | 0.4861220 |
| Creatinine..serum. | 21 | 0.3293085 |
| Anion.gap | 19 | 0.2979457 |
| AvgBloodPressureDiastolic | 18 | 0.2822644 |
| Platelet.Count | 11 | 0.1724949 |
| Heart.Rate | 6 | 0.0940881 |
| AvgBloodPressureSystolic | 4 | 0.0627254 |
| Bicarbonate | 3 | 0.0470441 |
| ConsolidatedRespiratoryRate | 2 | 0.0313627 |
After combining some variables we can see these differences in the missing value percentages:
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
| Before_Missing_Count | |
|---|---|
| Arterial.Blood.Pressure.systolic | 2478 |
| Non.Invasive.Blood.Pressure.systolic | 45 |
| Arterial.Blood.Pressure.diastolic | 2477 |
| Non.Invasive.Blood.Pressure.diastolic | 45 |
| Respiratory.Rate..Set. | 2470 |
| Respiratory.Rate..spontaneous. | 2300 |
| Respiratory.Rate..Total. | 2286 |
| Respiratory.Rate | 2 |
| Chloride..whole.blood. | 3569 |
| Chloride..Whole.Blood | 2267 |
| Sodium..whole.blood | 3362 |
| Sodium..Whole.Blood | 2121 |
| Potassium..whole.blood. | 2802 |
| Potassium..Whole.Blood.2 | 1441 |
| Hemoglobin | 2357 |
| Hemoglobin.2 | 0 |
| After_Missing_Count | |
|---|---|
| AvgBloodPressureSystolic | 2 |
| AvgBloodPressureDiastolic | 2 |
| ConsolidatedRespiratoryRate | 0 |
| AvgChloride | 2265 |
| AvgSodium | 2119 |
| AvgPotassium | 1441 |
| AvgHemoglobin | 0 |
For handling missing values in the mimic_iv_clean data set for variables related to lab tests and vital signs, we are doing the statistical test before imputation:
Impute missing values based on the similar cases. For that we are gonna calculate the missing values for each variable by age, gender and ethnicity. And going to use only the variables with highest percentage of missing values.
Why did we consider Age, Gender, Ethnicity:
Age: Age can be an important factor in healthcare data analysis as different age groups may have different patterns of missingness. For example, certain medical tests or measurements may be more common or relevant in specific age groups, leading to different rates of missing data.
Gender: Gender can also play a role in healthcare and medical data. Some conditions or tests may be more prevalent or important for one gender than the other, leading to differences in missing data patterns.
Ethnicity: Ethnicity can be associated with various health factors and medical conditions, which could influence the presence or absence of certain variables in the data set.
##
## Summary table for Temperature
## # A tibble: 4 × 4
## Age.Group min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 19-35 0.535 0.535 0.535
## 2 36-50 0.603 0.603 0.603
## 3 51-65 0.584 0.584 0.584
## 4 66-100 0.612 0.612 0.612
## [1] "Chi-square test result for Temperature"
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 5.9139, df = 3, p-value = 0.1159
##
## Summary table for Glucose..whole.blood.
## # A tibble: 4 × 4
## Age.Group min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 19-35 0.545 0.545 0.545
## 2 36-50 0.393 0.393 0.393
## 3 51-65 0.372 0.372 0.372
## 4 66-100 0.510 0.510 0.510
## [1] "Chi-square test result for Glucose..whole.blood."
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 106.31, df = 3, p-value < 2.2e-16
##
## Summary table for AvgChloride
## # A tibble: 4 × 4
## Age.Group min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 19-35 0.434 0.434 0.434
## 2 36-50 0.285 0.285 0.285
## 3 51-65 0.279 0.279 0.279
## 4 66-100 0.393 0.393 0.393
## [1] "Chi-square test result for AvgChloride"
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 82.618, df = 3, p-value < 2.2e-16
##
## Summary table for AvgSodium
## # A tibble: 4 × 4
## Age.Group min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 19-35 0.404 0.404 0.404
## 2 36-50 0.266 0.266 0.266
## 3 51-65 0.260 0.260 0.260
## 4 66-100 0.368 0.368 0.368
## [1] "Chi-square test result for AvgSodium"
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 75.938, df = 3, p-value = 2.281e-16
##
## Summary table for AvgPotassium
## # A tibble: 4 × 4
## Age.Group min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 19-35 0.273 0.273 0.273
## 2 36-50 0.171 0.171 0.171
## 3 51-65 0.175 0.175 0.175
## 4 66-100 0.253 0.253 0.253
## [1] "Chi-square test result for AvgPotassium"
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 51.396, df = 3, p-value = 4.029e-11
##
## Summary table for Temperature
## # A tibble: 2 × 4
## gender min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 F 0.591 0.591 0.591
## 2 M 0.609 0.609 0.609
## [1] "Chi-square test result for Temperature"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 2.0776, df = 1, p-value = 0.1495
##
## Summary table for Glucose..whole.blood.
## # A tibble: 2 × 4
## gender min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 F 0.553 0.553 0.553
## 2 M 0.411 0.411 0.411
## [1] "Chi-square test result for Glucose..whole.blood."
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 120.07, df = 1, p-value < 2.2e-16
##
## Summary table for AvgChloride
## # A tibble: 2 × 4
## gender min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 F 0.430 0.430 0.430
## 2 M 0.310 0.310 0.310
## [1] "Chi-square test result for AvgChloride"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 94.036, df = 1, p-value < 2.2e-16
##
## Summary table for AvgSodium
## # A tibble: 2 × 4
## gender min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 F 0.403 0.403 0.403
## 2 M 0.290 0.290 0.290
## [1] "Chi-square test result for AvgSodium"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 85.388, df = 1, p-value < 2.2e-16
##
## Summary table for AvgPotassium
## # A tibble: 2 × 4
## gender min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 F 0.283 0.283 0.283
## 2 M 0.192 0.192 0.192
## [1] "Chi-square test result for AvgPotassium"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 71.397, df = 1, p-value < 2.2e-16
##
## Summary table for Temperature
## # A tibble: 33 × 4
## ethnicity min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 AMERICAN INDIAN/ALASKA NATIVE 0.5 0.5 0.5
## 2 ASIAN 0.627 0.627 0.627
## 3 ASIAN - ASIAN INDIAN 0.824 0.824 0.824
## 4 ASIAN - CHINESE 0.531 0.531 0.531
## 5 ASIAN - KOREAN 0.333 0.333 0.333
## 6 ASIAN - SOUTH EAST ASIAN 0.438 0.438 0.438
## 7 BLACK/AFRICAN 0.647 0.647 0.647
## 8 BLACK/AFRICAN AMERICAN 0.474 0.474 0.474
## 9 BLACK/CAPE VERDEAN 0.545 0.545 0.545
## 10 BLACK/CARIBBEAN ISLAND 0.533 0.533 0.533
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect
## [1] "Chi-square test result for Temperature"
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 85.386, df = 32, p-value = 9.503e-07
##
## Summary table for Glucose..whole.blood.
## # A tibble: 33 × 4
## ethnicity min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 AMERICAN INDIAN/ALASKA NATIVE 0.8 0.8 0.8
## 2 ASIAN 0.608 0.608 0.608
## 3 ASIAN - ASIAN INDIAN 0.471 0.471 0.471
## 4 ASIAN - CHINESE 0.562 0.562 0.562
## 5 ASIAN - KOREAN 0.667 0.667 0.667
## 6 ASIAN - SOUTH EAST ASIAN 0.438 0.438 0.438
## 7 BLACK/AFRICAN 0.824 0.824 0.824
## 8 BLACK/AFRICAN AMERICAN 0.549 0.549 0.549
## 9 BLACK/CAPE VERDEAN 0.591 0.591 0.591
## 10 BLACK/CARIBBEAN ISLAND 0.6 0.6 0.6
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect
## [1] "Chi-square test result for Glucose..whole.blood."
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 73.816, df = 32, p-value = 3.768e-05
##
## Summary table for AvgChloride
## # A tibble: 33 × 4
## ethnicity min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 AMERICAN INDIAN/ALASKA NATIVE 0.8 0.8 0.8
## 2 ASIAN 0.510 0.510 0.510
## 3 ASIAN - ASIAN INDIAN 0.471 0.471 0.471
## 4 ASIAN - CHINESE 0.375 0.375 0.375
## 5 ASIAN - KOREAN 0.333 0.333 0.333
## 6 ASIAN - SOUTH EAST ASIAN 0.25 0.25 0.25
## 7 BLACK/AFRICAN 0.471 0.471 0.471
## 8 BLACK/AFRICAN AMERICAN 0.329 0.329 0.329
## 9 BLACK/CAPE VERDEAN 0.5 0.5 0.5
## 10 BLACK/CARIBBEAN ISLAND 0.467 0.467 0.467
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect
## [1] "Chi-square test result for AvgChloride"
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 74.873, df = 32, p-value = 2.726e-05
##
## Summary table for AvgSodium
## # A tibble: 33 × 4
## ethnicity min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 AMERICAN INDIAN/ALASKA NATIVE 0.8 0.8 0.8
## 2 ASIAN 0.451 0.451 0.451
## 3 ASIAN - ASIAN INDIAN 0.412 0.412 0.412
## 4 ASIAN - CHINESE 0.328 0.328 0.328
## 5 ASIAN - KOREAN 0.333 0.333 0.333
## 6 ASIAN - SOUTH EAST ASIAN 0.312 0.312 0.312
## 7 BLACK/AFRICAN 0.529 0.529 0.529
## 8 BLACK/AFRICAN AMERICAN 0.290 0.290 0.290
## 9 BLACK/CAPE VERDEAN 0.5 0.5 0.5
## 10 BLACK/CARIBBEAN ISLAND 0.433 0.433 0.433
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect
## [1] "Chi-square test result for AvgSodium"
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 69.65, df = 32, p-value = 0.0001313
##
## Summary table for AvgPotassium
## # A tibble: 33 × 4
## ethnicity min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 AMERICAN INDIAN/ALASKA NATIVE 0.6 0.6 0.6
## 2 ASIAN 0.333 0.333 0.333
## 3 ASIAN - ASIAN INDIAN 0.353 0.353 0.353
## 4 ASIAN - CHINESE 0.156 0.156 0.156
## 5 ASIAN - KOREAN 0 0 0
## 6 ASIAN - SOUTH EAST ASIAN 0.25 0.25 0.25
## 7 BLACK/AFRICAN 0.294 0.294 0.294
## 8 BLACK/AFRICAN AMERICAN 0.143 0.143 0.143
## 9 BLACK/CAPE VERDEAN 0.273 0.273 0.273
## 10 BLACK/CARIBBEAN ISLAND 0.233 0.233 0.233
## # ℹ 23 more rows
## Warning in chisq.test(chi_sq_variable): Chi-squared approximation may be
## incorrect
## [1] "Chi-square test result for AvgPotassium"
##
## Pearson's Chi-squared test
##
## data: chi_sq_variable
## X-squared = 91.469, df = 32, p-value = 1.234e-07
##
## Summary table for Temperature
## # A tibble: 2 × 4
## mortality min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 Alive 0.658 0.658 0.658
## 2 Death 0.291 0.291 0.291
## [1] "Chi-square test result for Temperature"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 460.22, df = 1, p-value < 2.2e-16
##
## Summary table for Glucose..whole.blood.
## # A tibble: 2 × 4
## mortality min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 Alive 0.460 0.460 0.460
## 2 Death 0.489 0.489 0.489
## [1] "Chi-square test result for Glucose..whole.blood."
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 2.7531, df = 1, p-value = 0.09707
##
## Summary table for AvgChloride
## # A tibble: 2 × 4
## mortality min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 Alive 0.345 0.345 0.345
## 2 Death 0.412 0.412 0.412
## [1] "Chi-square test result for AvgChloride"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 15.682, df = 1, p-value = 7.492e-05
##
## Summary table for AvgSodium
## # A tibble: 2 × 4
## mortality min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 Alive 0.325 0.325 0.325
## 2 Death 0.373 0.373 0.373
## [1] "Chi-square test result for AvgSodium"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 8.1352, df = 1, p-value = 0.004341
##
## Summary table for AvgPotassium
## # A tibble: 2 × 4
## mortality min_missing max_missing mean_missing
## <fct> <dbl> <dbl> <dbl>
## 1 Alive 0.232 0.232 0.232
## 2 Death 0.196 0.196 0.196
## [1] "Chi-square test result for AvgPotassium"
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: chi_sq_variable
## X-squared = 5.6942, df = 1, p-value = 0.01702
Based on the summary tables, we can conclude that there is no strong association between the demographic factors Age.Group, Gender, Ethnicity, Mortality and the missingness of the variables. However, there are some ethnicity types that show a strong association with missingness:
Due to a high number of missing values and its lack of significance in the reference document, the Temperature variable has been removed from the data set.
Even though ‘Glucose..whole.blood.’ shows a moderate significance between death and being alive, the reference document considers it as the most important factor. Therefore, we are going to keep this feature but remove the observations that are missing ‘Glucose..whole.blood.’.
Since the anion gap is calculated using the concentrations of sodium (Na), chloride (Cl), and bicarbonate (HCO3) in mmol/L, we will retain only the anion gap feature and drop the sodium and chloride features. Bicarbonate is considered important in the reference document, so we will keep it after imputing missing values. Additionally, other lab tests with a low percentage of missing values—INR, Prothrombin.time, Anion.gap, and Creatinine..serum will have missing values imputed with the median. However, Potassium, which has a high percentage of missing values and is not considered an important feature, will be dropped.
Check for missing values
| x | |
|---|---|
| subject_id | 0 |
| gender | 0 |
| age | 0 |
| mortality | 0 |
| ethnicity | 0 |
| Heart.Rate | 4 |
| Heart.rate.Alarm…High | 0 |
| Heart.Rate.Alarm…Low | 0 |
| SpO2.Desat.Limit | 0 |
| INR | 0 |
| Prothrombin.time | 0 |
| Anion.gap | 0 |
| Creatinine..serum. | 0 |
| Bicarbonate | 0 |
| Glucose..whole.blood. | 0 |
| GCS…Eye.Opening | 0 |
| Hematocrit | 0 |
| Platelet.Count | 7 |
| Age.Group | 0 |
| Myocardial | 0 |
| Rupture | 0 |
| Thrombosis | 0 |
| Systolic | 0 |
| Diastolic | 0 |
| Comb_DS | 0 |
| Fibrillation | 0 |
| PulmonaryDisease | 0 |
| Stroke | 0 |
| Hyperlipidemia | 0 |
| Dementia | 0 |
| AvgBloodPressureSystolic | 0 |
| AvgBloodPressureDiastolic | 0 |
| ConsolidatedRespiratoryRate | 0 |
| AvgHemoglobin | 0 |
Check for missing values
| x | |
|---|---|
| subject_id | 0 |
| gender | 0 |
| age | 0 |
| mortality | 0 |
| ethnicity | 0 |
| Heart.Rate | 0 |
| Heart.rate.Alarm…High | 0 |
| Heart.Rate.Alarm…Low | 0 |
| SpO2.Desat.Limit | 0 |
| INR | 0 |
| Prothrombin.time | 0 |
| Anion.gap | 0 |
| Creatinine..serum. | 0 |
| Bicarbonate | 0 |
| Glucose..whole.blood. | 0 |
| GCS…Eye.Opening | 0 |
| Hematocrit | 0 |
| Platelet.Count | 0 |
| Age.Group | 0 |
| Myocardial | 0 |
| Rupture | 0 |
| Thrombosis | 0 |
| Systolic | 0 |
| Diastolic | 0 |
| Comb_DS | 0 |
| Fibrillation | 0 |
| PulmonaryDisease | 0 |
| Stroke | 0 |
| Hyperlipidemia | 0 |
| Dementia | 0 |
| AvgBloodPressureSystolic | 0 |
| AvgBloodPressureDiastolic | 0 |
| ConsolidatedRespiratoryRate | 0 |
| AvgHemoglobin | 0 |